Text Extraction in Complex Color Document Images for Enhanced Readability

نویسندگان

  • P. Nagabhushan
  • Shivananda Nirmala
چکیده

Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text with colors of background. Automatic segmentation of foreground text in such document images is very much essential for smooth reading of the document contents either by human or by machine. In this paper we propose a novel approach to extract the foreground text in color document images having complex background. The proposed approach is a hybrid approach which combines connected component and texture feature analysis of potential text regions. The proposed approach utilizes Canny edge detector to detect all possible text edge pixels. Connected component analysis is performed on these edge pixels to identify candidate text regions. Because of background complexity it is also possible that a non-text region may be identified as a text region. This problem is overcome by analyzing the texture features of potential text region corresponding to each connected component. An unsupervised local thresholding is devised to perform foreground segmentation in detected text regions. Finally the text regions which are noisy are identified and reprocessed to further enhance the quality of retrieved foreground. The proposed approach can handle document images with varying background of multiple colors and texture; and foreground text in any color, font, size and orientation. Experimental results show that the proposed algorithm detects on an average 97.12% of text regions in the source document. Readability of the extracted foreground text is illustrated through Optical character recognition (OCR) in case the text is in English. The proposed approach is compared with some existing methods of foreground separation in document images. Experimental results show that our approach performs better.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Complex Background and Foreground Extraction in Color Document Images using Interval Type-2 Fuzzy

This paper deals with the problem of extracting the text information from complex ground from color document images. Developing general framework for separating the foreground text and background information from complex document image is still a challenging problem because of its high unpredictability and complexity. In this paper a new interval type-2 fuzzy based thresholding method is propos...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

Image Segmentation for Text Extraction

This paper presents a methodology for extracting text from images such as document images, scene images etc. Text that appears in these images contains important and useful information. Text extraction in images has been used in large variety of applications such as mobile robot navigation, document retrieving, object identification, vehicle license plate detection, etc. In this paper, we emplo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Intelligent Information Management

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010